Skip to content

Conversation

@netanel-haber
Copy link
Collaborator

@netanel-haber netanel-haber commented Jun 26, 2025

58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

@Funatiq Funatiq requested review from Funatiq June 26, 2025 17:42
@tensorrt-cicd
Copy link
Collaborator

PR_Github #10145 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10145 [ skip ] completed with state SUCCESS
Skipping testing for commit 700c079

@dcampora
Copy link
Collaborator

Please ignore the skip, I triggered it by mistake on this PR.

@netanel-haber netanel-haber force-pushed the user/nhaber/feature/fixed_align_sample_state_with_trtllm_sampler_sample_state branch 3 times, most recently from 6874175 to 035e67a Compare June 29, 2025 15:57
Signed-off-by: Netanel Haber <[email protected]>

minimize diff

Signed-off-by: Netanel Haber <[email protected]>

minimize diff

Signed-off-by: Netanel Haber <[email protected]>
@netanel-haber netanel-haber force-pushed the user/nhaber/feature/fixed_align_sample_state_with_trtllm_sampler_sample_state branch from 6397d52 to 84138c6 Compare June 29, 2025 16:06
…_with_trtllm_sampler_sample_state

Signed-off-by: Netanel Haber <[email protected]>
@netanel-haber netanel-haber marked this pull request as ready for review June 29, 2025 16:22
@netanel-haber netanel-haber requested review from a team as code owners June 29, 2025 16:22
@netanel-haber
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10242 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10242 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7569 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Copy link
Collaborator

@wili-65535 wili-65535 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work for simplifying of the samplers! LGTM on my side.

@netanel-haber netanel-haber enabled auto-merge (squash) June 30, 2025 10:37
@Funatiq Funatiq requested a review from Copilot June 30, 2025 10:49

This comment was marked as outdated.

@netanel-haber netanel-haber force-pushed the user/nhaber/feature/fixed_align_sample_state_with_trtllm_sampler_sample_state branch from a68c9fd to 051fe4a Compare June 30, 2025 13:27
@netanel-haber
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10369 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10369 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7666 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Copy link
Collaborator

@dcampora dcampora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, as perf issue is now fixed.

Copy link
Collaborator

@suyoggupta suyoggupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for AD changes

@suyoggupta suyoggupta requested review from Copilot and suyoggupta June 30, 2025 18:58
@netanel-haber netanel-haber merged commit 6ee94c7 into NVIDIA:main Jun 30, 2025
3 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR re-applies previously reverted speculative decoding changes and fixes a performance regression in TorchSampler by unifying the new_tokens state format and refactoring sampler interfaces across the codebase.

  • Refactored get_spec_decoder to accept TorchSampler.Args and integrated TorchSampler in speculative modes.
  • Overhauled TorchSampler API: introduced Args/Store dataclasses, generic sampling helpers, and unified sample_async/update_requests.
  • Removed legacy sampler classes (Eagle3Sampler, Eagle3Decoder), updated resource managers and scheduler to use all_requests.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tensorrt_llm/_torch/speculative/utils.py Updated get_spec_decoder signature and error handling for unsupported modes.
tensorrt_llm/_torch/speculative/mtp.py Adapted MTPSampler to new TorchSampler.Args and simplified stop‐criteria calls.
tensorrt_llm/_torch/speculative/eagle3.py Removed legacy Eagle3 sampler classes, added Eagle3OneModelSampler.
tensorrt_llm/_torch/pyexecutor/seq_slot_manager.py Switched loops to use scheduled_batch.all_requests().
tensorrt_llm/_torch/pyexecutor/scheduler.py Simplified all_requests to return a list instead of chain.
tensorrt_llm/_torch/pyexecutor/sampler.py Major refactor of TorchSampler: new dataclasses, unified sampling functions, updated state.
tensorrt_llm/_torch/pyexecutor/py_executor.py Propagated max_num_sequences, integrated SeqSlotManager, adjusted logit fields.
tensorrt_llm/_torch/pyexecutor/model_engine.py Updated batch‐index logic (py_batch_idx) and input preparation to new sampler format.
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py Updated TorchSampler instantiation to use Args and added SeqSlotManager.
Comments suppressed due to low confidence (3)

tensorrt_llm/_torch/pyexecutor/sampler.py:98

  • Add unit tests for top_k_sampling_batch, top_p_sampling_batch, and the generic sample pipeline to validate sampling distributions, edge cases (e.g., top_k=1, top_p=0.0), and correct handling of tensor dimensions.
def top_k_sampling_batch(logits, top_k=50):

tensorrt_llm/_torch/pyexecutor/sampler.py:180

  • [nitpick] Add a docstring explaining this helper's purpose, the expected format of strategy, logits, and what is returned (next_tokens and softmax probabilities).
def sample(strategy: Strategy, logits: torch.Tensor):

tensorrt_llm/_torch/speculative/utils.py:113

  • [nitpick] Document this exception in the get_spec_decoder docstring so callers know it will raise for unknown modes, or consider returning None to match previous behavior if that was expected.
        f"Unsupported speculative decoding mode: {spec_config.spec_dec_mode}")

@property
def all_requests(self) -> chain[LlmRequest]:
return chain(self.context_requests, self.generation_requests)
def all_requests(self) -> list[LlmRequest]:
Copy link

Copilot AI Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider returning an Iterable[LlmRequest] or Sequence[LlmRequest] instead of forcing a new list allocation on each call, or change the return annotation to list explicitly to reflect that behavior.

Suggested change
def all_requests(self) -> list[LlmRequest]:
def all_requests(self) -> Sequence[LlmRequest]:

Copilot uses AI. Check for mistakes.
new_tokens,
gen_logits_host=gen_logits_host,
log_probs_host=log_probs_host)
new_tokens_host = new_tokens.to(device="cpu", non_blocking=True)
Copy link

Copilot AI Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transferring the entire new_tokens tensor to CPU each iteration can be costly. If only a subset of slots is active, consider slicing new_tokens to only copy relevant indices and reduce data movement overhead.

Copilot uses AI. Check for mistakes.
Shunkangz pushed a commit to Shunkangz/TensorRT-LLM that referenced this pull request Jul 2, 2025
…state to trtllm samper tokens format (NVIDIA#5513)

58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

Signed-off-by: Netanel Haber <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025
…state to trtllm samper tokens format (NVIDIA#5513)

58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

Signed-off-by: Netanel Haber <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…state to trtllm samper tokens format (NVIDIA#5513)

58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

Signed-off-by: Netanel Haber <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…state to trtllm samper tokens format (NVIDIA#5513)

58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

Signed-off-by: Netanel Haber <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…state to trtllm samper tokens format (NVIDIA#5513)

58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

Signed-off-by: Netanel Haber <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
…state to trtllm samper tokens format (NVIDIA#5513)

58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

Signed-off-by: Netanel Haber <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
…state to trtllm samper tokens format (NVIDIA#5513)

58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

Signed-off-by: Netanel Haber <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
…state to trtllm samper tokens format (NVIDIA#5513)

58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

Signed-off-by: Netanel Haber <[email protected]>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
…state to trtllm samper tokens format (NVIDIA#5513)

58a8a8f - these changes were previously merged to main here.
6aef149 - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue).
This PR is meant to re-merge these changes along with a fix to prevent the regression.

The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes.

Signed-off-by: Netanel Haber <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants